Read Alignment

Variant Calling

Column

Overview

  • GATK4 best practices
  • Site level filtering:
    • bi-allelic SNPs
    • 95% truth sensitivity (VQSR)
  • Genotype level filtering:
    • DP: 4 - 77 (mean DP + 4*SD)
    • GQ >= 20
    • Else missing (./.)
  • If 1/6 groups with min 20/25 non-miss samples: 7064770 SNPs
  • If 6/6 groups with min 20/25 non-miss samples: 34033 SNPs

Number of Variants

variable raw site_filtered gt_filtered
TOTAL_SNPS 16,643,754.00 9,115,052.00 7,064,770.00
NUM_IN_DB_SNP 12,117,419.00 8,486,322.00 6,621,301.00
NOVEL_SNPS 4,526,335.00 628,730.00 443,469.00
PCT_DBSNP 0.73 0.93 0.94
DBSNP_TITV 1.94 2.07 2.19
NOVEL_TITV 1.07 1.53 1.64
TOTAL_INDELS 3,454,237.00 0.00 0.00
TOTAL_MULTIALLELIC_SNPS 856,605.00 0.00 0.00

Column

SNPs per sample

Sample metrics

Population metrics

Population VCFs were produced from cohort final-VCF by extracting samples and removing fixed reference sites (GT==“RR”) and all-missing sites (GT==“./.”).

Intersections (upsetr)

Missingness

min_called_samples DUK DUC DU6 DU6P DUhLB FZTDU
1 1.00 1.00 1.00 1.00 1.00 1.00
2 1.00 1.00 1.00 1.00 1.00 1.00
3 1.00 1.00 1.00 1.00 1.00 1.00
4 1.00 1.00 1.00 1.00 1.00 1.00
5 1.00 1.00 1.00 1.00 1.00 1.00
6 1.00 1.00 1.00 1.00 1.00 1.00
7 1.00 1.00 1.00 1.00 1.00 1.00
8 1.00 0.99 0.99 1.00 1.00 1.00
9 1.00 0.99 0.99 0.99 1.00 1.00
10 0.99 0.99 0.99 0.99 0.99 1.00
11 0.96 0.99 0.84 0.99 0.97 1.00
12 0.92 0.98 0.66 0.98 0.95 0.99
13 0.88 0.95 0.50 0.96 0.91 0.99
14 0.83 0.91 0.38 0.93 0.86 0.97
15 0.78 0.86 0.27 0.89 0.82 0.94
16 0.73 0.79 0.19 0.85 0.76 0.90
17 0.67 0.70 0.13 0.80 0.71 0.85
18 0.61 0.60 0.08 0.74 0.65 0.79
19 0.54 0.49 0.05 0.67 0.59 0.72
20 0.46 0.38 0.03 0.58 0.52 0.65
21 0.38 0.26 0.01 0.47 0.45 0.57
22 0.29 0.16 0.01 0.34 0.37 0.50
23 0.20 0.08 0.00 0.21 0.28 0.40
24 0.11 0.03 0.00 0.09 0.18 0.29
25 0.03 0.01 0.00 0.03 0.08 0.14

Figures produced externally from site-filtered biSNPs with script “./batches123_04_FinalVCF/scripts/15_visualize_missingness.R”.

SNP Annotations

LDD & SFS

Column

Allele frequency state per population

LD Decay

Column

Alternative Allele Frequency Distribution

Minor Allele Frequency Distribution

Diversity

Population Structure

Column

PC1 - PC2

PC2 - PC3

PC3 - PC4

PC4 - PC5

PC5 - PC6

Column

Hieratchical Clustering

Admixture

Admixture detailed

Column

K2

K3

K4

Column

K5

K6

Divergent Signatures

Directional Signatures